HPAVC

home *** CD-ROM | disk | FTP | other *** search

/ HPAVC / HPAVC CD-ROM.iso / BPS210.ZIP / TUTOR.ASC < prev next >

Wrap

Text File | 1991-07-25 | 26KB | 638 lines

Back Error Propagation Simulator (BPS) Brief Tutorial by Emilio A. Apey Copyright George Mason University September 1989 Table of contents 1. Introduction ......................................................................... 3 2. Walk-through an example .............................................. 3 Figure 1 ................................................................................ 4 Figure 2 ................................................................................ 5 3. Main menu ............................................................................. 7 3.1 Build/load network ............................................... 8 3.2 Specify training set ............................................. 10 3.3 Edit and view menu ............................................. 10 3.4 Save network ........................................................... 12 3.5 Find error .................................................................. 12 3.6 Number of epochs .................................................. 12 3.7 Cycle until converges ......................................... 13 3.8 Adjust learning parameters ............................ 13 3.9 Production menu ................................................... 14 3.10 Learn .......................................................................... 15 3.11 Transfer function options menu .................... 16 4. Formulas .............................................................................. 17 1) Introduction The BPS program simulates a multilayer neural network using back error propagation as learning algorithm. This simulator is intended for educational purposes and depends heavily on user interaction. It is important for the user to be familiar with the back error propagation algorithm before using BPS; in this way, the user can understand what the simulator is doing. In this brief "tutorial" most of the features of BPS will be demonstrated by building a simple example. Figure 1, overview of BPS, can be very useful in becoming familiar with BPS. Also, it is highly recommended for the user to draw the topology of the network before using the simulator, indicating the layers numbers and the unit numbers. This will be helpful when the program asks for information about layers and units. To have an idea of a network topology built by BPS, see figure 2. 2) Walk-through an example A simple example is to teach a network the XOR (exclusive OR) mapping, which is the following: input output ----- ------ 0 0 0 0 1 1 1 0 1 1 1 0 For this example there are four "training patterns," one for each mapping that the network has to learn. Each training pattern consist of an "input vector" and an "output vector." The first thing to do, before entering BPS, is to write a "training set file" and a "productions file" using an editor. The format of the training set file is very flexible; the restrictions are that an input vector has to precede its corresponding output vector, there must be at least one blank character separating each number, and there can be no blank lines. Figure 1: overview of BPS Now, for BPS to give meaningful results the user has to be careful that the dimension of the input vectors matches the number of units in the input layer of the network, and that the dimension of the output vectors matches the number of units in the output layer. If they do not match, BPS will seem to be working normally, but the results will be meaningless. Here are three training set file formats that could be used in this example: 0.0 0.0 0.1 0.0 0.0 0.0 0.0 1.0 0.9 0.1 0.0 1.0 0.0 0.9 0.0 1.0 0.1 1.0 1.0 0.1 0.9 0.0 1.0 0.0 1.0 0.9 0.9 1.0 1.0 1.0 0.1 0.0 0.9 1.0 1.0 0.1 Format 1 Format 2 Format 3 The productions file is similar to the training set file, but the output vectors are omitted; the network has to produce them. As with the training set file, the format is flexible, but there can be no blank lines. 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 0.0 1.0 1.0 1.0 1.0 Format 1 Format 2 Format 3 For the XOR example the training set file will be named XOR.DAT and the productions file will be name XOR.PRO. From the training patterns it can be seen that the network needs to have two input units (the dimension of an input vector) and one output unit (the dimension of an output vector). The number of hidden layers and hidden units is up to the user; it is good practice to keep networks as small as possible because it takes less time to train them, since there are fewer weights to update. However, if they are too small they might not be able to learn all the training patterns; this is one of the artistic aspects of neural networks. The simplest topology that can learn the XOR mapping is the following: Figure 3: XOR network topology Now it is time to get the program running and start doing something with this network. To run the program in the VAX 8530 type BPS. If you have the PC version, select the executable version that best fit your computer (bps286.exe, bps8088.exe) and type the name without the extension. gmuvax> bps Once the initial message appears, hit return to enter the main menu. 3) Main menu In this menu is where the user spends most time, here networks are defined and then they are trained. Under this menu there are three other menus: the production menu, the edit and view menu, and the transfer function menu; they will be described later on. 3.1) Build/load network (b) The first thing to do when using BPS is to define a network, because many functions depend on the network topology. To define a network choose "b" (for build) and answer the questions. Remember to hit the return key after each entry. Supposed that you have been training a network for a while and then decide to create or load a new network. BPS will ask if you want to save the old network. Then, BPS will ask if you want to load a network from your working directory: Do you want to load a network from file? (y/n): n To load a network answer "y"; then BPS will ask you for the name of a previously defined network. The network is loaded and BPS returns to the main menu. In the case that you want to define a new network, then answer "n". Next BPS will ask if you want to initialize the network with random weights. Do you want random weights? (y/n): y When the answer is "y", BPS will completely connect all the units of adjacent layers and the initial weights of the edges will be in the range that you will soon specify. When the answer is "n", BPS will not connect any unit in the network at all; it will create the layers and units, but there will not be any edges. To connect the units you would go to the Edit and View menu and add edges between the units that you want to be connected. So, at this point, BPS would create only a skeleton of the network. In the XOR example two edges will be added later on, an edge going from the first unit in the first layer to the first unit in the third layer, and an edge going from the second unit in the first layer to the first unit in the third layer (see fig. 3). But the rest of the network is fully connected, so at this point answer "y", you do want edges with random weights between the layers. Then BPS will ask you for the lower and upper limit of the random weights. A good range is -0.34 and 0.34, that is -1/3 and +1/3. Give lower limit of random interval: -0.34 Give upper limit of random interval: 0.34 To initialize the weights with numbers close to zero is good practice because if the weights are large it is very difficult, sometimes impossible, for BPS to change those weights. Next, BPS will ask you how many layers of units you want. A typical back propagation network has three layers, but if you want to experiment with more layers BPS allows you to build networks with more layers. From figure 3 you can see that the network has three layers, then answer "3". Enter number of layers: 3 Now the number of units for each layer has to be specified. BPS will start asking the number of units for the first layer (input layer) up to the layer number specified in the total number of layers (output layer). Be careful that the number of units in the input layer and output layer matches the dimensions of the input and output vectors specified in the training set. Number of nodes for layer no. 1: 2 Number of nodes for layer no. 2: 1 Number of nodes for layer no. 3: 1 Since two units are needed for the first layer type "2"; for the second layer one unit is needed, so type "1"; and for the third layer one unit is needed; type "1". BPS now returns to the main menu and it has given the default name of "unnamed" to the network. By now the network looks like this: Figure 4: XOR partial network topology 3.2) Specify training set (t) To specify or change the training set, choose "t" (for training set) from the main menu. BPS will ask you for the name of the file where you wrote the training patterns. Enter the file name of the training set: xor.dat 3.3) Edit and View menu (e) When you choose "e" (for edit and view) in the main menu the "edit and view menu" appears. In this menu you can inspect the network and modify it. The options on the left of the menu are to modify the network and the options on the right of the menu are to inspect the network. Choose "a" (for activities and thresholds) to see the activities and thresholds of the units in the network. The screen looks like this: Layer Node Activity Threshold 1 1 ----> 0.000000 0.000000 1 2 ----> 0.000000 0.000000 2 1 ----> 0.000000 0.000000 3 1 ----> 0.000000 0.000000 From this screen you can see that the first layer has two units (1,2), the second layer has one unit, and the third layer has one unit. Since this network was just defined, the activities and thresholds of the units are all zeros. Now choose "w" to inspect the weights in the edges; this is one of the most used options in BPS. The screen will look like this: Layer Node Weight Node Layer 1 1 -0.20023 1 2 1 2 0.23874 1 2 2 1 0.18973 1 3 The layers and units on the left are the source units, and the layers and nodes on the right are the target units. From this screen you can see the weights between two units. The XOR network needs two more edges, so choose "e" (for edge) to add a new edge. Edges in the network have direction; that is they come out of one unit and they go into another unit; to add an edge you have to specify the source unit and the target unit. Since BPS handles multilayer networks, each unit location is defined by the number of the layer where the unit is and the unit's number in that layer; see figure 3 for layer and unit numbers. The following entries are to add an edge from the first unit in the first layer to the first unit in the third layer: Layer number of source unit: 1 Number of source unit: 1 Layer number of target unit: 3 Number of target unit: 1 Do you want a random weight? (y/n): y Change random weight range? (y/n): n The given weight was: -0.15652 As you can see, BPS will ask you if you want a random weight; if you answer "n" BPS will ask you for a specific weight that you want to give to that edge. If you decide to add a random weight, you can change the range for the new weight. Now that a new edge has been added, you can, if you want, check it by choosing "w" (for weights) and see if the new edge is really there. To add the second edge from the second unit in the first layer to the first unit in the third layer choose "e" again; and respond to the prompts, but this time the unit number of the source unit is 2. After you have done this you can check the network by choosing "w" again. At this point the network topology should look like figure 3 above. The remaining options in the edit and view menu work similarly. When modifying the network BPS will ask you about the location of the node(s) involved, information that you can get from a diagram of the network that you are building, like figure 3. The "t" (for training set) choice is to see the training set that you specified without having to leave BPS. Now you can leave this menu by choosing "q"; this will put you back in the main menu. 3.4) Save network (s) At this point that the network is complete and you may want to save it. When a network is saved a name is given to the network; also, all the parameters that the network has at that time are saved, including the number of epochs that it has been trained. In this way, when a network is loaded, all the parameters are loaded into BPS at the same time. Name for file to save network: xor211.net The user can give any name to the network; XOR211.NET describes the network as with two input units, one hidden unit, and one output unit. 3.5) Find error (f) If you want to know what is the current value of the network error function, choose "f" (for find error) from the main menu. The new error will be displayed in the main menu where it says "error." If you do not choose "f" and keep training the network, the error displayed in the main menu is not the present error, but it is the error when you requested it the last time; this operation was designed to be user activated. Initially the error will be zero, but this does not means that the error of the network is zero; this is just an initialization. 3.6) Number of epochs (n) This number has two different purposes, depending on which learning mode you choose to train the network. One of these modes is to let the learning algorithm to cycle until a specified error is reached; see next section for the other mode. "Cycle until converges" means that the network will be train until an user specified error is reached. In case that the user specified error can not be reached, BPS will exhaust the number of epochs specified when "n" was chosen. In other words, the network will be train until either it reaches an error or it uses all the specified epochs. When you choose "n", BPS will ask you for the number of epochs to train the network. Enter the number of epochs to train the network: 1000 With what we have specified up to now we could attempt to train this network, but let's see the other options first. 3.7) Cycle until converges (c) When "c" is choose, you can turn the "cycle until converges" mode ON or OFF. If you turn it ON, then BPS will ask you for the target error; if you turn it OFF, BPS will return to the main menu. The other mode of learning is active when "cycle until converges" is OFF. It will train the network for the specified number of epochs, without paying attention to the error of the network. For the XOR example leave "cycle until converges" OFF; so by now you do not need to choose "c". 3.8) Adjust learning parameters (a) This is the fun part of BPS, adjusting the learning parameters and see with which set of parameters the network learns faster, or if it learns at all. The back propagation algorithm uses two parameters to adjust the weights in a network: eta (learning rate) and alpha (momentum factor); see section 4. BPS has a speed up mechanism that adjusts these two parameters after each epoch; this mechanism introduces two new variables: an upper bound for eta and an increasing rate for eta (see section 4). These four variables can be set by the user. They default to conservative values, so most networks will slowly learn the patterns. For some topologies a larger bound for eta, 1.0, will allow the network to learn faster; instead for other topologies, a large eat would lead them to instability and their weights will become very large. By experience it has been found that if a network's weights after a few thousand epochs have become large (above 10) and the error is still large, probably that network will not learn the mapping. However, if after a few thousand epochs the network has not learn the mapping but its weights are still small (below 2), there is still a chance for the network to learn the mapping. It is good practice to save a network as soon as you define it. In this way, if the network does not learn with a set of learning parameters, you can load the original network and try another set of learning parameters; this allows you to start from the same initial state. If the network still does not learn, them you can try another initial state (another set of random weights) by creating a new network. And if the network still does not learn, then you can try another topology: add units or edges. As you can see, there is some testing to be done with these networks; but the more experiments that you do, the more familiar you will become with their behavior. When you choose "a" (for adjust learning parameters), you can change the values of these four variables: eta, alpha, bound of eta, and increasing rate of eta. For the XOR problem the default values will be used; so once again, you do not need to choose "a". 3.9) Production menu (p) Before training the network let's see what kind of output it is producing. To do this choose "p" (for production menu) from the main menu. In the production menu you can do productions (propagate input vectors through the network to obtain output vectors) from a file or from the keyboard. You can also save those productions in a file. All these possibilities are shown in this menu. To do productions from the keyboard choose "k"; BPS will give you the number of input units (dimension of input vector) and ask you for the input vector: Input vector: 0.0 0.0 Output vector: 0.52343 The terms of the input vector need to have a space in between, you can also hit return and give more terms in the next line. The output at this time will be around 0.5 because the network has not been trained yet. Try other input vectors and you will see that all of the output vectors are around 0.5. Sometimes you want to give a large set of productions to the network; that is why the production file was created, because you will get tired of typing input vectors. So you can specify a production file where you have the set of productions that you want to do. To set the production file choose "p" and BPS will ask you: Enter file name: xor.pro BPS will remember this file, so you do not have to specify it every time that you enter the production menu. Sometimes you may want to have several productions file for the same network; to change the production file just choose "p" again and give the name of the new file. The "v" (for view) option is to view the production file, to check if it is the right set of productions that you want to test. After you have specified a production file you can choose "f" (for productions from file) and get many productions at once. Suppose that you want to save productions when you do them either from a file or from the keyboard. You can tell BPS to open a file to write the productions. To do this choose "s" (for save); this choice is used to open and close a file. If you want to open a file respond "y" and BPS will ask you: Save productions in a file? (y/n): y Enter file name to save productions: xor.out If you respond "n" and a saving file was open, BPS will close the file. However, you do not need to close this file specifically; if you leave the production menu and a saving file was open, BPS will close it for you. Now you know everything about the production menu, so you can choose "q" to return to the main menu. 3.10) Learn (l) Finally the network is ready to start the training. For this choose "l" (for learning) and depending on the state of the "cycle until converges" switch (see section 3.6), the learning will be in either of the two modes. Since we left this switch in OFF for this example, the learning will exhaust the specified number of epochs; the display while the network is being trained will look like this: 20 eta: 0.06 error: 0.32012345 40 eta: 0.10 error: 0.32012201 60 eta: 0.45 error: 0.31999982 80 eta: 0.60 error: 0.31999873 . . . . . . From this display you can see how the epoch number increases, how eta changes and (hopefully) how the error decreases. Probably for the first 1000 epochs the error will stay around 0.32, and may be up to 3000 epochs. But when the error starts decreasing it goes very fast. If the learning rate is high (around 0.95), the error will decrease up to a point and then stay there; if the learning rate is too high (above 1.0), the error will decrease rapidly up to a point and then it will start increasing. You may even get a running time error, "floating point excepting"; this is because the weights became extremely large. You can train the network for a while and then go to the "edit and view" menu to see how the weights have changed. Try training the same initial network with several learning parameters and see what kind of results you obtain. These exercises will help you to get an idea of how the back propagation algorithm works so you can do bigger applications afterwards. 3.11) Transfer function options menu (m) You enter this menu by choosing "m" from the main menu. This menu has two options: to change the amplitude of the sigmoid function (M) and to shift the sigmoid function downward so it is symmetrical around the x-axis. Figure 5: Sigmoid function with range 0.0, M and -M/2, M/2 By default the range of the sigmoid function is between 0.0 and 1.0. When you select "m" in this menu BPS will ask you for the new upper limit of the sigmoid function: Enter new value: 2.0 Now the range of the sigmoid function is [0.0,2.0] instead of being in the range [0.0,1.0]. Note that to be able to train a network with this new amplitude, you have to define the training patterns to be in this range also. The other option is to make the sigmoid function symmetrical with respect to the x-axis; this option can be switched ON and OFF by choosing "s" in this menu. Note that the range of the sigmoid function will be between -M/2 and M/2. For example, if you specify M = 2.0, then the range of the sigmoid function would be -1.0 and 1.0. 4) Formulas To give you more insight into what BPS is doing while a network is being trained, this section describes some of the main formulas of the learning algorithm. In these formulas p stands for training patterns. Then the summation over p means one epoch; all the training patterns have to go through. Symbols for the units are j, i, and k. The j is for the unit which parameters are being calculated; the i is for units in the previous layer, and the k is for units in the next layer. Figure 6: units and subscripts Weight adjustment: ╞Wji(t+1) = h ╖p (dpj Opi) + a ╞Wji(t) (1) where dpj = (Tpj - Opj) (Opj/M) (1 - Opj/M) for output units. dpj = (Opj/M) (1 - Opj/M) ╖k (dpk Wkj) for hidden units. O : unit activity level. T : output unit target. M : sigmoid function amplitude. A difference between formula 1 and the original back propagation formula is the summation of the products of the deltas and previous unit activation over all the training patterns, p. By doing this operation the weights will be updated only once per epoch of training. This is the first part of a two parts method proposed by Vogl to accelerate learning by back propagation. Transfer function: 1 Opj = M ( ------------------------ - 1/2 ) - ( ╖i Wji Opi + qj) / M 1 + e Error function: error = ╖p [1/2 ╖j (Tpj - Opj)**2 ] Eta and alpha adjustment: C (t) = << ╞Wji(t-1) , ╞Wji(t) >> Correlation. if ( C(t) < 0 ) then h --> 0.01 a --> 0.00 else if (C(t) < (C(t-1) + 0.05 * C(t-1)) or h < eta bound) then h --> h + eta rate a --> a + 0.01 endif endif The correlation gives an indication of how the weights are changing. If this term becomes negative, then eta and alpha are decreased drastically to 0.01 and 0.0 respectively. If the correlation is positive, then eta and alpha have passed the first test to be increased. The second test to increase eta and alpha is that if eta is below its maximum allowed value (its bound) and if the correlation is within the last correlation plus five percent. The original correlation test was implemented by David Schreibman and consisted in checking if C(t) was either positive of negative; if it was negative then the values of eta and alpha were dropped to low values, and if it was positive eta and delta were increased. The additional test of checking if the correlation is increasing or decreasing was added to give more stability to the learning.